13 research outputs found
Recommended from our members
O(N)-Space Spatiotemporal Filter for Reducing Noise in Neuromorphic Vision Sensors
Neuromorphic vision sensors are an emerging technology inspired by how retina processing images. A neuromorphic vision sensor only reports when a pixel value changes rather than continuously outputting the value every frame as is done in an 'ordinary' Active Pixel Sensor (ASP). This move from a continuously sampled system to an asynchronous event driven one effectively allows for much faster sampling rates; it also fundamentally changes the sensor interface. In particular, these sensors are highly sensitive to noise, as any additional event reduces the bandwidth, and thus effectively lowers the sampling rate. In this work we introduce a novel spatiotemporal filter with O(N)O(N) memory complexity for reducing background activity noise in neuromorphic vision sensors. Our design consumes 10× less memory and has 100× reduction in error compared to previous designs. Our filter is also capable of recovering real events and can pass up to 180 percent more real events
A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers
Nanopore sequencing generates noisy electrical signals that need to be
converted into a standard string of DNA nucleotide bases using a computational
step called basecalling. The accuracy and speed of basecalling have critical
implications for all later steps in genome analysis. Many researchers adopt
complex deep learning-based models to perform basecalling without considering
the compute demands of such models, which leads to slow, inefficient, and
memory-hungry basecallers. Therefore, there is a need to reduce the computation
and memory cost of basecalling while maintaining accuracy. Our goal is to
develop a comprehensive framework for creating deep learning-based basecallers
that provide high efficiency and performance. We introduce RUBICON, a framework
to develop hardware-optimized basecallers. RUBICON consists of two novel
machine-learning techniques that are specifically designed for basecalling.
First, we introduce the first quantization-aware basecalling neural
architecture search (QABAS) framework to specialize the basecalling neural
network architecture for a given hardware acceleration platform while jointly
exploring and finding the best bit-width precision for each neural network
layer. Second, we develop SkipClip, the first technique to remove the skip
connections present in modern basecallers to greatly reduce resource and
storage requirements without any loss in basecalling accuracy. We demonstrate
the benefits of RUBICON by developing RUBICALL, the first hardware-optimized
basecaller that performs fast and accurate basecalling. Compared to the fastest
state-of-the-art basecaller, RUBICALL provides a 3.96x speedup with 2.97%
higher accuracy. We show that RUBICON helps researchers develop
hardware-optimized basecallers that are superior to expert-designed models
Tailor: Altering Skip Connections for Resource-Efficient Inference
Deep neural networks use skip connections to improve training convergence.
However, these skip connections are costly in hardware, requiring extra buffers
and increasing on- and off-chip memory utilization and bandwidth requirements.
In this paper, we show that skip connections can be optimized for hardware when
tackled with a hardware-software codesign approach. We argue that while a
network's skip connections are needed for the network to learn, they can later
be removed or shortened to provide a more hardware efficient implementation
with minimal to no accuracy loss. We introduce Tailor, a codesign tool whose
hardware-aware training algorithm gradually removes or shortens a fully trained
network's skip connections to lower their hardware cost. Tailor improves
resource utilization by up to 34% for BRAMs, 13% for FFs, and 16% for LUTs for
on-chip, dataflow-style architectures. Tailor increases performance by 30% and
reduces memory bandwidth by 45% for a 2D processing element array architecture
Microscaling Data Formats for Deep Learning
Narrow bit-width data formats are key to reducing the computational and
storage costs of modern deep learning applications. This paper evaluates
Microscaling (MX) data formats that combine a per-block scaling factor with
narrow floating-point and integer types for individual elements. MX formats
balance the competing needs of hardware efficiency, model accuracy, and user
friction. Empirical results on over two dozen benchmarks demonstrate
practicality of MX data formats as a drop-in replacement for baseline FP32 for
AI inference and training with low user friction. We also show the first
instance of training generative language models at sub-8-bit weights,
activations, and gradients with minimal accuracy loss and no modifications to
the training recipe
Recommended from our members
Reshaping Deep Neural Networks for Efficient Hardware Inference
The latest Deep Learning (DL) methods for designing Deep Neural Networks (DNN) have significantly expanded our ability to train data processing systems. Coupled with exponential growth in available digital data, we have seen dramatic accuracy improvements in DNNs and widespread adoption of these models in different applications.This increased demand has motivated innovations in DNN architecture design to deliver high-quality output. For example, advanced DL models can include irregular connections between their layers, have more parameters, and employ computationally complex neurons. Unfortunately, these new architectural additions often increase the implementation complexity of the DNNs on hardware, particularly when deploying DL models for inference in scale-out and power-limited systems.Currently, to deploy a DNN on a custom platform, an abstract of the DL model is used to create a functionally identical realization. However, because altering this abstract changes the functionality of the DL model, hardware designers keep the model unchanged for a lossless implementation.This thesis shows that a co-design approach can improve the hardware implementation of DL models. In a co-design approach, the designer reshapes the DNN architecture to better fit a target processing platform and preserves its accuracy by retraining the model.We describe a custom accelerator for Spiking Neural Networks (SNN) with improved computational cost and memory utilization because of reshaping the layers and neurons of the model. We then apply these changes to the existing SNN models and show that they can maintain their accuracy after the reshaping and retraining. In addition, we introduce novel applications for SNNs based on the new architecture. We also present a stochastic noise filter for pre-processing SSN's input with improved accuracy and memory utilization. Furthermore, we explain a reshaping method for Residual Networks (ResNet) to reduce their memory footprint while preserving their accuracy. This thesis also introduces a method for accelerating the co-design process. Reshaping DL models can increase the complexity of their training stage. We present an auto tuner for the learning rate (an essential parameter for training DNNs) that simplifies the manual tuning for this parameter and can accelerate the retraining of DL models
Benchmarking vision kernels and neural network inference accelerators on embedded platforms
Developing efficient embedded vision applications requires exploring various algorithmic optimization trade-offs and a broad spectrum of hardware architecture choices. This makes navigating the solution space and finding the design points with optimal performance trade-offs a challenge for developers. To help provide a fair baseline comparison, we conducted comprehensive benchmarks of accuracy, run-time, and energy efficiency of a wide range of vision kernels and neural networks on multiple embedded platforms: ARM57 CPU, Nvidia Jetson TX2 GPU and Xilinx ZCU102 FPGA. Each platform utilizes their optimized libraries for vision kernels (OpenCV, VisionWorks and xfOpenCV) and neural networks (OpenCV DNN, TensorRT and Xilinx DPU). For vision kernels, our results show that the GPU achieves an energy/frame reduction ratio of 1.1–3.2 compared to the others for simple kernels. However, for more complicated kernels and complete vision pipelines, the FPGA outperforms the others with energy/frame reduction ratios of 1.2–22.3. For neural networks [Inception-v2 and ResNet-50, ResNet-18, Mobilenet-v2 and SqueezeNet], it shows that the FPGA achieves a speed up of [2.5, 2.1, 2.6, 2.9 and 2.5] and an EDP reduction ratio of [1.5, 1.1, 1.4, 2.4 and 1.7] compared to the GPU FP16 implementations, respectively.This is a manuscript of an article published as Qasaimeh, Murad, Kristof Denolf, Alireza Khodamoradi, Michaela Blott, Jack Lo, Lisa Halder, Kees Vissers, Joseph Zambreno, and Phillip H. Jones. "Benchmarking vision kernels and neural network inference accelerators on embedded platforms." Journal of Systems Architecture (2020): 101896. DOI: 10.1016/j.sysarc.2020.101896. Posted with permission.</p
Recommended from our members
Tailor: Altering Skip Connections for Resource-Efficient Inference
Deep neural networks use skip connections to improve training convergence. However, these skip connections are costly in hardware, requiring extra buffers and increasing on- and off-chip memory utilization and bandwidth requirements. In this paper, we show that skip connections can be optimized for hardware when tackled with a hardware-software codesign approach. We argue that while a network’s skip connections are needed for the network to learn, they can later be removed or shortened to provide a more hardware efficient implementation with minimal to no accuracy loss. We introduce
Tailor
, a codesign tool whose hardware-aware training algorithm gradually removes or shortens a fully trained network’s skip connections to lower their hardware cost.
Tailor
improves resource utilization by up to 34% for BRAMs, 13% for FFs, and 16% for LUTs for on-chip, dataflow-style architectures.
Tailor
increases performance by 30% and reduces memory bandwidth by 45% for a 2D processing element array architecture
Epidemiological and clinical features of 2019 novel coronavirus diseases (COVID-19) in the South of Iran
BACKGROUND: In March 2020, the WHO declared the novel coronavirus (COVID-19) outbreak a global pandemic. Although the number of infected cases is increasing, information about its clinical characteristics in the Middle East, especially in Iran, a country which is considered to be one of the most important focal points of the disease in the world, is lacking. To date, there is no available literature on the clinical data on COVID-19 patients in Iran. METHODS: In this multicenter retrospective study, 113 hospitalized confirmed cases of COVID-19 admitted to university affiliated hospitals in Shiraz, Iran from February 20 to March 20 were entered in the study. RESULTS: The mean age was 53.75 years and 71 (62.8%) were males. The most common symptoms at onset were fatigue (75: 66.4%), cough (73: 64.6%), and fever (67: 59.3%). Laboratory data revealed significant correlation between lymphocyte count (P value = 0.003), partial thromboplastin time (P value = 0.000), international normalized ratio (P value = 0.000) with the severity of the disease. The most common abnormality in chest CT scans was ground-glass opacity (77: 93.9%), followed by consolidation (48: 58.5%). Our results revealed an overall 8% (9 out of 113 cases) mortality rate among patients, in which the majority was among patients admitted to the ICU (5: 55.6%). CONCLUSION: Evaluating the clinical data of COVID-19 patients and finding the source of infection and studying the behavior of the disease is crucial for understanding the pandemic